New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRM-20565 - Improve ajax dedupe lookups on contact add form #10341
Conversation
colemanw
commented
May 12, 2017
•
edited by civicrm-builder
edited by civicrm-builder
- CRM-20565: Better & more configurable dedupe lookups when adding a contact
I'm going to try to take a look at this next week |
Great, thanks @eileenmcnaughton . |
$this->addRadio('contact_ajax_check_similar', ts('Check for Similar Contacts'), array( | ||
'1' => ts('While Typing'), | ||
'0' => ts('When Saving'), | ||
'2' => ts('Never'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: In all existing installs this preference is already set to either '1' or '0'. These new options keep the original meaning of 1 and 0 so no upgrade script is needed.
Thoughts on this one @eileenmcnaughton ? |
Sorry - I still have this on my radar - I'm expecting to do significant work on dedupes going into next quarter - with extracting to an extension & making all interaction with the dedupeFinder & merger via api being high priorities |
@eileenmcnaughton I've rebased this. Can you take a look? |
I've started looking at this & in general I agree with the change. I'm still working through it so these are not final comments.
Contact.getmatches The former is this function - matches for the data we have. The second would be pairs of duplicates to potentially merge (like the merge screen uses but less nasty). (The second obviously does not exist as yet.).
In general the defaults for Supervised & Unsupervised seem to be the reverse of what they should be. ie. Unsupervised is happy to given anyone the bash if there is a matching email whereas Supervised is conservative about putting up options. I'm not quite sure how to deal with this but as a user I feel like I would want matches as soon as I've entered first name in & then the top matches should get increasingly specific as I provide more data. @lcdservices I think you understand the thinking behind Supervised vs Unsupervised - which I'm grappling with. |
and if you add e.g.
|
Regarding 4 - I'm just working off what people have as the default out-of-the-box settings. I realise people can customise it but most don't. The experience would be that they are used to having it resolve when they enter last name & now it won't - so it would feel like a bug/regression to a lot of people. |
(your reply to 2 makes sense & I agree that is fine) |
I just tested and you're right about number 4. It's not very useful that it only works when all 3 fields are an exact match. I honestly don't know how that qualifies as a "supervised" rule as anyone with exactly the same first&last name and email is unquestionably the same person. That said, what we had in place prior to this PR wasn't exactly gold-standard either. It matched on last name and nothing else, so Jan Smith would match Bob Smith, etc. - also not very useful results, and not at all configurable as it is now. |
My instinct is that the Supervised & Unsupervised rules are messed up & were implemented in reverse of how they were conceived - the reason I was hoping to get input from @lcdservices I feel like this WOULD feel like a regression to some users as is & I feel like we should consider adding a third rule type that exists by default called 'InputAssistance' or something & making it match on last name or email being sufficient. Overall I feel like this might not be universally better & we should probably solicit wider input. Possibly from the dev list although it's more a UI question than a dev question but our only other list seems to be 'partners' which is definitely not right and please not a double post.... From a code point of view I'm happy with the code apart from wanting to dig a bit further into the admin form side of things and agree it is an improvement. I just realised I haven't tested that we still do awful ghastly things to people when they choose ' on save ' for the setting |
@eileenmcnaughton sorry -- missed the earlier ping on this. I'm happy to give my 2 cents on the rules... ;-)
|
One more comment after reading the JIra issue -- |
So to attempt to pull things together - I think both @lcdservices & I would be happy to see only TWO options
I think you'd struggle to find anyone who likes the on-save validation - but we could ask.
|
I agree about a made-for-deduping rule. I actually custom coded one of those for Woolman years ago. It did all kinds of fuzzy OR logic, taking phone numbers, nick names, email, etc into consideration. It wasn't very performant but it was awesome at nailing hard-to-find dupes |
I had another go at this and the usability seemed ok if I created a new for-purpose rule & used that instead of the 'Supervised' rule (in practice I did this by editing the Supervised rule to have a threshold of 5 & deleting CRM/Dedupe/BAO/QueryBuilder/IndividualSupervised.php). Using the existing built in rule didn't give me any feedback until all 3 fields matched which was a worse user experience than without the patch. With my 'new custom rule' the performance was 'ok' even on the wmf database. By that I mean that on many names it resolved in real time but even on 'John Smith' it didn't cause queries to spin off and bring the database down. I did get a very strange UI experience on the slow ones as nothing showed up & then suddenly I had 3 matching boxes of matches. However, the problem with the code as is is that the names are sorted by Sort name. Since there are more than 20 Eileen's in our database the list of Eileen's displayed once I entered first_name did not include me. Once I entered my email it fired again - but presented the list of names in the same order so I was still only to see the first 20 possible matches which include me. I tried to make it sort by weight so the best matches would float to the top - the code is my efforts, but while I managed to get the api to sort by weight the UI still did not. Doing data entry the whole thing would work better if the email address field were the first one you hit because then it could try to match on that first & also I think it's realistically more important for dataentry than fields like Job Title & Contact Type - tangental...
|
@colemanw what should we do with this? I think it seems like to get it mergeable we would need to
I think it's a nice improvement - but it also seems not quite mergeable at the moment & I don't think either of us are quite committed enough to get it over the line at the moment? If that is the case we should close the PR & track it from gitlab (we won't lose access to the changes - although if you ALSO delete the branch & we haven't linked into the commit specifically we will struggle to find it again |
The trouble with adding a new rule is that we bump against the limitations of the rule categories. Currently we have "Supervised" "Unsupervised" and "General". We could create a new category but that's yet another can-o-worms to open. IMO we ought to create a new hard-coded rule called "Name or Email or Phone or Address" and make it the default Supervised rule. The current default Supervised rule isn't very good. For existing sites, if they have the current default "Name and Email" default Supervised rule, we switch it. If they've configured a custom Supervised rule, we leave it alone. That could be done in a separate PR. Just to think out loud, I think the ideal fuzzy rule would do something like
@eileenmcnaughton you know more about optimizing queries on huge databases than I do. Is the above too crazy for large databases to handle? |
OK - so to answer that I think I need to feel more comfortable with how the existing Supervised rule is used - ie. audit the places. I realise my discomfort with the idea is about the places it is used. TODO audit this. My inclination is that adding a DataEntry category is safer because it involves changing behaviour in less places. Where it differs logically from the 'normal' concept of 'Supervised' is that we would want it to be more aggressive about suggestions - ie. as you type 'Coleman' suggestions for 'Coleman' would appear (for a large DB they might tweak that but on a small DB that's a good rule). Regarding whether it should be a hardcoded rule - I can do some tests - I'm not sure that the hard-coded rules are more efficient from my previous tests in this instance. |
@colemanw I had an idea here. On one side of this patch we have a significant UI improvement on the other we have a change in behaviour to the selection of which contacts are duplicates that makes me uncomfortable. Could we just get the first part commited - ie. write a (deprecated) api or ajax call that does the same look up as right now & use that & then review (if we choose) switching the mechanism? |
@eileenmcnaughton well the current implementation of the ajax lookup leaves a lot to be desired. It matches on last_name and nothing else. |
@colemanw Yep & we could get that merged & then consider improving the rules. |
Per discussion on civicrm#10341 Since the default dedupe rules are inadequate for this, we'll just use the contact api for now.
@eileenmcnaughton done. I also added some throttling to prevent multiple messages popping up if the user fills out the form quickly. |
test this please |
I just tested this & it feels better from a UI POV & I didn't spot any issues - I think it's good to merge |
Merging as per tag. |
@colemanw @eileenmcnaughton please close the JIRA ticket. Thanks! |